The JMU Reffit data set contains texts from student users discussing topics related to their university and its surrounding area. The data included geographic locations allowing individuals to connect online dialogue to specific on campus events, spaces, and locations. This section includes common topics of student life, dorms, dining halls, and campus events reflecting a positive tone on the university and the community. In comparison to JMU, ODU’s data set includes Reddit tests that go to the campus and focus more into the city of Norfolk and the Hampton Roads region. This section about ODU deals with off-campus issues like housing, parking, crime, and neighborhoods, and by doing that the tone is more negative.
Together, these two corpora descriptions and data sets highlight a difference in focus from JMU;s campus life and ODU;s off-campus environments. This leads to the hypothesis that ODU's student dialogue focuses more on off-campus issues such as Norfolk neighborhoods, crime, and housing. ODU’s data shows more negative statements associated with off campus terms relating to factors like Norfolk, Hampton Roads, graffiti, and parking. Meanwhile, JMU's student dialogue focuses heavily on on-campus spaces like the campus buildings, dorms, dining halls, and student life. JMU focuses more on positive statements associated with on-campus terms like the Quad, dorms, and certain campus events.
After examining both data sets, it appears that ODU’s Reddit corpus emphasizes and focuses more on off campus life and community concerns. While on the other hand JMU’s corpus is focused more on internal campus experiences. In the ODU;s dataset the student dialogue references Norfolk neighborhoods, safety concerns, crimes, parking, and housing difficulties, suggesting that students engage more with topics reflecting the urban environment surrounding their university. Based on the tones of these discussions it tends to be more negative due to the frustration of the off campus living conditions surrounding ODU. In comparison to that JMU data set reveals more campus landmarks, dorms, student activities, dining halls, which creates a more positive tone around social and physical spaces of campus life. By doing this JMU reflects more of a stronger sense of community with on campus experiences. This comparison between the two universities supports the hypothesis that the students' discussions from ODU are shaped by off campus challenges while JMU’s dialogue is more cultured around positive campus central texts.
ODU’s student dialogue focuses more on off-campus issues such as Norfolk neighborhoods, crime, and housing. ODU’s data shows more negative statements associated with off campus terms relating to factors like Norfolk, Hampton Roads, graffiti, and parking. Meanwhile, JMU's student dialogue focuses heavily on on-campus spaces like the campus buildings, dorms, dining halls, and student life. JMU focuses more on positive statements associated with on-campus terms like the Quad, dorms, and certain campus events.
We will start to compare the differences in school focus from on and off campus depending on the universities of JMU and ODU by looking at university building, dorms, and campus events for JMU while looking at ODU we focus more on the city of Norfolk and surrounding roads and areas.
Different Trends around ODU
By doing these visualization steps it lets you insert an interactive Voyant visualization comparing your two data sets right into your notebook. This lets you visually analyze patterns like word usage, sentiment, or topic emphasis between the corpora. After embedding the Voyant visualizations it allows the analysis of the two corpora to become more clear, credible, and
The visualization step helps test your hypothesis and ideas by either confirming your prediction by showing the patterns that support the ideas or it complicates it by showing the unexpected overlaps or contradictions in the two data sets.
2.2 Visualization 2 [Change Title Depending on Visualization]¶
By doing these visualization steps it lets you insert an interactive Voyant visualization comparing your two data sets right into your notebook. This lets you visually analyze patterns like word usage, sentiment, or topic emphasis between the corpora. After embedding the Voyant visualizations it allows the analysis of the two corpora to become more clear, credible, and engaging.
The visualization step helps test your hypothesis and ideas by either confirming your prediction by showing the patterns that support the ideas or it complicates it by showing the unexpected overlaps or contradictions in the two data sets. It also helped our group support our hypothesis by showing on the maps what locations each campus focussed on. FOr instance, ODU focuses more on Norklok and surrounding roads and areas while JMU focuses more on camps.
Part 3: Data Cleaning and Refinement¶
📋 About _processed.csv Files¶
Each group has been provided with a _processed.csv file. This file includes all of the necessary data to create sentiment maps. The issue is that this is still spatial data in its raw form and contains erroneous data. In this section, you will take your _processed.csv file and clean it up.
For the sake of example, a UNC dataset has been preloaded to show what the result should look like.
⚠️ Note: Delete this cell before turning in the final version.
# =============================================================================
# SETUP: Import Libraries and Load Data
# =============================================================================
# This cell sets up all the tools we need for spatial sentiment analysis
# Force reload to pick up any changes to data_cleaning_utils
# This ensures we get the latest version of our custom functions
import importlib
import sys
if 'data_cleaning_utils' in sys.modules:
importlib.reload(sys.modules['data_cleaning_utils'])
# Core data analysis library - like Excel but for Python
import pandas as pd
# Import our custom functions for cleaning and analyzing location data
from data_cleaning_utils import (
clean_institution_dataframe, # Standardizes and cleans location data
get_data_type_summary, # Shows what types of data we have
get_null_value_summary, # Identifies missing data
create_location_counts, # Counts how often places are mentioned
create_location_sentiment, # Calculates average emotions by location
create_time_animation_data, # Prepares data for animated time series
)
# Interactive plotting library - creates maps and charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as pyo
# =============================================================================
# CONFIGURE PLOTLY FOR HTML EXPORT
# =============================================================================
# Configure Plotly for optimal HTML export compatibility
# Method 1: Set renderer for HTML export (use 'notebook' for Jupyter environments)
pio.renderers.default = "notebook"
# Method 2: Configure Plotly for offline use (embeds JavaScript in HTML)
pyo.init_notebook_mode(connected=False) # False = fully offline, no external dependencies
# Method 3: Set template for clean HTML appearance
pio.templates.default = "plotly_white"
# Method 4: Configure Plotly to include plotly.js in HTML exports
import plotly
plotly.offline.init_notebook_mode(connected=False)
# Load the cleaned JMU Reddit data (already processed and ready to use)
# This contains: posts, locations, coordinates, sentiment scores, and dates
df_jmu = pd.read_pickle("assets/data/jmu_reddit_geoparsed_clean.pickle")
# =============================================================================
# LOAD YOUR INSTITUTION'S DATA
# =============================================================================
# Replace the group number and institution name with your assigned data
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number (e.g., "group_1", "group_2", etc.)
# Replace "UNC_processed.csv" with your institution's file name
df_institution = pd.read_csv("group_data_packets/group_6/python/UNC_processed.csv")
# =============================================================================
# CREATE RAW LOCATION MAP (Before Cleaning)
# =============================================================================
# This shows the "messy" data before we fix location errors
# You'll see why data cleaning is essential!
# STEP 1: Count how many times each place is mentioned
# Group identical place names together and count occurrences
place_counts = df_institution.groupby('place').agg({
'place': 'count', # Count how many times each place appears
'latitude': 'first', # Take the first latitude coordinate for each place
'longitude': 'first', # Take the first longitude coordinate for each place
'place_type': 'first' # Take the first place type classification
}).rename(columns={'place': 'count'}) # Rename the count column for clarity
# STEP 2: Prepare data for mapping
# Reset index makes 'place' a regular column instead of an index
place_counts = place_counts.reset_index()
# Remove any places that don't have valid coordinates (latitude/longitude)
# This prevents errors when trying to plot points on the map
place_counts = place_counts.dropna(subset=['latitude', 'longitude'])
# STEP 3: Create interactive scatter map
# Each dot represents a place, size = how often it's mentioned
fig = px.scatter_map(
place_counts, # Our prepared data
lat='latitude', # Y-coordinate (north-south position)
lon='longitude', # X-coordinate (east-west position)
size='count', # Bigger dots = more mentions
hover_name='place', # Show place name when hovering
hover_data={ # Additional info in hover tooltip
'count': True, # Show mention count
'place_type': True, # Show what type of place it is
'latitude': ':.4f', # Show coordinates with 4 decimal places
'longitude': ':.4f'
},
size_max=25, # Maximum dot size on map
zoom=4, # How zoomed in the map starts (higher = closer)
title='Raw Location Data: Places Mentioned in UNC Reddit Posts',
center=dict(lat=35.5, lon=-80) # Center map on North Carolina for UNC
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-positron", # Clean, light map style
width=800, # Map width in pixels
height=600, # Map height in pixels
title_font_size=16, # Title text size
title_x=0.5 # Center the title
)
# Configure for HTML export compatibility
fig.show(config={'displayModeBar': True, 'displaylogo': False})
3.1 Toponym Misalignment Analysis¶
📊 Action Items¶
Study the map of your institution alongside the .csv file. Try to determine some of the major mistakes the geoparser has made.
💡 Example: In the map above, Carolina is placed on the North Carolina and South Carolina border. In all likelihood, Reddit posters mean North Carolina when they post Carolina.
✍️ Writing Task¶
Write a description of the major toponym misalignments in your dataset.
💡 Sample Description: "The University of North Carolina's home campus is in Chapel Hill, NC and most locations should appear around those coordinates. There are some major erroneous locations, however. In the main, Chapel Hill has been placed in Tennessee, and likewise there is a Hill Chapel in Virginia, but this likely refers to Chapel Hill in NC as well."
3.2 Toponym Refinement¶
In this section you are going to clean up the map above by fixing some of the major locations and getting a sense of the data.
📺 Complete Video Tutorial: https://www.youtube.com/watch?v=TZgRYn1TxGI
🔧 Step 1: Access Your Data¶
- Navigate to your group in
group_data_packets - Go to the
pythonfolder - Open the CSV file that ends with
_processed.csv(e.g.,GMU_processed.csv)
📋 Step 2: Data Structure¶
Your CSV file should contain the following 14 columns:
- school_name - Institution identifier
- unique_id - Record identifier
- date - Post date
- sentences - Text content
- roberta_compound - Sentiment score
- place - Original location name
- latitude - Original coordinates
- longitude - Original coordinates
- revised_place (empty) - Corrected location name
- revised_latitude (empty) - Corrected latitude
- revised_longitude (empty) - Corrected longitude
- place_type (empty) - Location category
- false_positive (empty) - Invalid location flag
- checked_by (empty) - Quality control tracker
🔧 Step 3: Data Cleaning Process¶
📊 Open in Google Sheets: Upload your CSV file to Google Sheets for collaborative editing
🔧 Fix Locations: Enter corrected data in the revised_ columns for mistaken locations
🏷️ Categorize Places: Use the place_type column with these standardized categories:
- Country - National boundaries
- State - State/province level
- County - County/region level
- City - Municipal areas
- Neighborhood - Local districts
- University - Educational institutions
- Building - Specific structures
- Road - Streets and highways
💡 Pro Tip: Create a data validation dropdown in Google Sheets for consistent place types
❌ Mark False Positives: Set false_positive to True for sentences that don't actually reference locations
👥 Track Progress: Enter your name in checked_by to coordinate team efforts
💾 Step 4: Export Clean Data¶
- Go to File → Download → Comma Separated Values (.csv)
- Save as:
[institution]_processed_clean.csvin the same folder- Example:
group_2/python/ODU_processed_clean.csv
- Example:
⚠️ Note: Delete this cell before turning in the final version.
3.3 Revised Map¶
✍️ Writing Task¶
Explain the major fixes you made to the map and what you learned about the dataset in the process. You can answer any of the following questions:
- What were some places of note on the campus you investigated?
- What appeared to be the main concern many of the posts?
- What were some places that surprised you?
🔧 Technical Implementation¶
Code Changes Required:
- Run the python below to display map. Remember, this will only work if the .csv file was saved properly.
- You will have to change the code to properly set it to your group and file.
df_institution_cleaned = pd.read_csv('group_data_packets/group_6/python/UNC_processed_clean.csv')
⚠️ Remember: You are changing the group number and the institution name.
# =============================================================================
# LOAD CLEANED DATA
# =============================================================================
# Load the CSV file you manually cleaned in Google Sheets
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number
# Replace "UNC_processed_clean.csv" with your institution's cleaned file
df_institution_cleaned = pd.read_csv(
"group_data_packets/group_2/python/odu_processed_clean.csv"
)
# =============================================================================
# APPLY DATA CLEANING FUNCTIONS
# =============================================================================
# Use our custom function to standardize the cleaned data
# Apply the cleaning function to standardize data types and handle missing values
# This function ensures all datasets have the same format for consistent analysis
df_institution_cleaned = clean_institution_dataframe(df_institution_cleaned)
# Display first few rows to verify the cleaning worked properly
# This shows the structure and sample content of your cleaned data
df_institution_cleaned.head()
DataFrame cleaned successfully!
| school_name | unique_id | date | sentences | roberta_compound | place | latitude | longitude | revised_place | revised_latitude | revised_longitude | place_type | false_positive | checked_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ODU | ODU_20 | 2024-10-10 19:58:32 | Northern Lights at odu right now | 0.004979 | Northern Lights Hotel | 59.91660 | 30.25000 | Old Dominion University | 36.88530 | -76.30590 | University | True | Alex |
| 1 | ODU | ODU_26 | 2024-10-10 20:29:40 | Been in VA 31 years and this year is the first... | 0.105875 | Virginia | 37.54812 | -77.44675 | Old Dominion University | 36.88530 | -76.30590 | University | False | Alex |
| 2 | ODU | ODU_31 | 2024-10-11 01:14:32 | The CME is moving at a speed estimate of 1200-... | 0.002683 | Earth | 0.00000 | 0.00000 | Earth | 0.00000 | 0.00000 | none | True | Alex |
| 3 | ODU | ODU_33 | 2024-10-10 21:36:30 | Nope, I saw them up here in Manassas, well I s... | -0.036483 | City of Manassas | 38.75095 | -77.47527 | City of Manassas | 38.75095 | -77.47527 | city | False | Alex |
| 4 | ODU | ODU_36 | 2024-10-10 21:38:42 | Isn’t this literally the worst time to be in T... | -0.822557 | Tampa | 27.94752 | -82.45843 | Tampa | 27.94752 | -82.45843 | city | False | Alex |
3.4 Map Customization¶
🔧 Map Adjustment Instructions¶
- The revised map will display below.
- Tweak the visuals on the map by setting the zoom and the center.
- Find the snippet below in the
px.scatter_mapfunction - Change the
zoom,lat, andlongto set your view.
- Find the snippet below in the
zoom=2, # Set the zoom level here
center=dict(lat=38.5, lon=-106), # Set the center of your view here
# =============================================================================
# CREATE CLEANED LOCATION MAP (After Manual Corrections)
# =============================================================================
# This map shows your data AFTER you fixed the location errors
# Compare this to the raw map above to see the improvement!
# STEP 1: Count occurrences using CLEANED/CORRECTED location data
# Now we use 'revised_place' instead of 'place' - these are your corrections!
place_counts = (
df_institution_cleaned.groupby("revised_place") # Group by corrected place names
.agg(
{
"revised_place": "count", # Count mentions of each corrected place
"revised_latitude": "first", # Use corrected latitude coordinates
"revised_longitude": "first", # Use corrected longitude coordinates
"place_type": "first", # Keep place type classification
}
)
.rename(columns={"revised_place": "count"}) # Rename count column for clarity
)
# STEP 2: Prepare data for mapping
place_counts = place_counts.reset_index() # Make 'revised_place' a regular column
# Remove places without valid corrected coordinates
place_counts = place_counts.dropna(subset=["revised_latitude", "revised_longitude"])
# STEP 3: Create the cleaned location map
fig = px.scatter_map(
place_counts,
lat="revised_latitude", # Use corrected Y-coordinates
lon="revised_longitude", # Use corrected X-coordinates
size="count", # Dot size = mention frequency
hover_name="revised_place", # Show corrected place name on hover
hover_data={
"count": True, # Show how many mentions
"place_type": True, # Show place category
"revised_latitude": ":.4f", # Show corrected coordinates
"revised_longitude": ":.4f",
},
size_max=45, # Maximum dot size
title="Places around ODU off campus",
zoom=12, # 📝 TO DO: Adjust zoom level for your region
center=dict(lat=36.8853, lon=-76.3059), # 📝 TO DO: Adjust center coordinates
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-positron", # Clean, readable map style
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
Revision Insights¶
✍️ Write any new spatial insights you arrived at now that you have fixed the data.
Part 4: Spatial Comparison¶
In this section you are going to compare the spatial distribution of JMU and your institution. You will create two maps using the custom create_locations_counts() function.
🔧 Function Parameters¶
This function has two key parameters:
minimum_count=- Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default2means that if a location appears once it will not registerplace_type_filter=- This uses theplace_typecolumn to filter out only the types of places you want. The default isNone, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places. In the sample below, only "States", "City", and "Country" places will show up.
⚠️ NOTE: This does mean you would have had to tag place types properly in the cleanup process
💡 Example Usage¶
create_location_counts(
df_institution_cleaned,
minimum_count=2,
place_type_filter=["State", "City", "Country"]
)
🔧 Customization Instructions¶
⚠️ Important: Make sure that minimum_count and place_type_filter for create_locations_counts are the same for both maps.
Visual Customization:
- Tweak the center and zoom of your map to highlight an important contrast
- Set the
color_discrete_sequence=px.colors.qualitative.Plotlyto something of your choice
⚠️ Note: Delete this cell for the final version
4.1 JMU Spatial Distribution¶
# =============================================================================
# JMU SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a filtered map showing only certain types of places for JMU
# STEP 1: Use custom function to filter and count JMU locations
# This function applies the same filtering to both datasets for fair comparison
JMU_filtered_locations = create_location_counts(
df_jmu, # JMU Reddit data
minimum_count=2, # Only show places mentioned 2+ times
place_type_filter=['State', 'City', 'Country'] # Only these place types
)
# STEP 2: Create colored scatter map
# Each place type gets a different color to show spatial patterns
fig = px.scatter_map(
JMU_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Dot size = mention frequency
color="place_type", # Different colors for different place types
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=12, # 📝 TO DO: Adjust to highlight interesting patterns
title="On campus places at JMU",
center=dict(lat=38.435318880428284, lon=-78.86980844807114), # 📝 TO DO: Center on area of interest
color_discrete_sequence=px.colors.qualitative.Plotly # Categorical color palette
)
# STEP 3: Customize layout
fig.update_layout(
map_style="carto-positron",
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a comparable map for your institution using identical filtering
# STEP 1: Apply the same filtering to your institution's data
# Using identical parameters ensures fair comparison with JMU
institution_filtered_locations = create_location_counts(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU map
place_type_filter=["city","County","road","region", "state"] # Same place types as JMU
)
# STEP 2: Create matching visualization
# Keep all settings the same as JMU map for direct comparison
fig_institution_cleaned = px.scatter_map(
institution_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="place_type", # Same color coding as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=10, # 📝 TO DO: Adjust for your region
title="ODU focus in Norflok", # 📝 TO DO: Update institution name
center=dict(lat=36.84681, lon=-76.28522), # 📝 TO DO: Center on your region
color_discrete_sequence=px.colors.qualitative.Plotly, # Same colors as JMU
)
# STEP 3: Apply identical layout settings
fig_institution_cleaned.update_layout(
map_style="carto-positron", # Same style as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_cleaned.show(config={'displayModeBar': True, 'displaylogo': False})
4.2 Spatial Analysis¶
✍️ Writing Task¶
Write a paragraph on an important spatial difference or similarity between the two datasets that confirms or complicates your hypothesis.
💡 Example: "While we theorized that UNC would have a significant number of posts about the Southeast, the mapping data does not reveal this. Instead, the Reddit feed rarely speaks about states outside of North Carolina, and when it does it is about institutions out west."
Part 5: Sentiment Analysis Comparison¶
In this section, you are going to compare the sentiments by location for each institution. You are going to do so by first customizing the create_location_sentiment() function.
🔧 Function Parameters¶
This takes the same parameters as the create_location_counts() function above:
minimum_count=- Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default2means that if a location appears once it will not registerplace_type_filter=- This uses theplace_typecolumn to filter out only the types of places you want. The default isNone, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places.- 💡 Tip: You might consider showing only one type of place if it helps make your argument. For example, if you are investigating school spirit, it makes the most sense to look at Universities and buildings.
- ⚠️ NOTE:
place_type_filteronly works if you tagged places properly in the cleanup process.
💡 Example Usage¶
create_location_sentiment(
df_jmu,
minimum_count=2,
place_type_filter=None # Include all place types
)
🔧 Customization Instructions¶
Visual Optimization:
- Tweak the center and zoom of your map to highlight an important contrast
- Experiment with different divergent color scales to optimize the visuals
- Change:
RdYlGNincolor_continuous_scale="RdYlGn"to something of your choice - Color Reference: https://plotly.com/python/builtin-colorscales/#builtin-diverging-color-scales
- Change:
- Experiment with different map templates to optimize visuals:
- Change:
carto-positroninmap_style="carto-positron" - Options include: 'basic', 'carto-darkmatter', 'carto-darkmatter-nolabels', 'carto-positron', 'carto-positron-nolabels', 'carto-voyager', 'carto-voyager-nolabels', 'dark', 'light', 'open-street-map', 'outdoors', 'satellite', 'satellite-streets', 'streets', 'white-bg'
- Change:
⚠️ Note: Delete this cell for the final version
# =============================================================================
# JMU SENTIMENT ANALYSIS MAP
# =============================================================================
# Shows the EMOTIONAL tone of how JMU students talk about different places
# Red = negative emotions, Green = positive emotions
# STEP 1: Calculate average sentiment scores by location
# This function groups identical locations and averages their sentiment scores
df_jmu_sentiment = create_location_sentiment(
df_jmu, # JMU Reddit data with sentiment scores
minimum_count=2, # Only places mentioned 2+ times (for reliability)
place_type_filter=None # Include all place types for comprehensive view
)
# STEP 2: Create sentiment visualization map
# Color represents emotional tone: Green = positive, Red = negative, Yellow = neutral
fig_sentiment = px.scatter_map(
df_jmu_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Larger dots = more mentions (more reliable sentiment)
color="avg_sentiment", # Color intensity = emotional tone
color_continuous_scale="RdYlGn", # Red-Yellow-Green scale (Red=negative, Green=positive)
hover_name="revised_place",
hover_data={
"count": True, # How many posts contributed to this sentiment
"avg_sentiment": ":.3f", # Average sentiment score (3 decimal places)
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=2, # 📝 TO DO: Adjust to focus on interesting patterns
title="Average Sentiment by Location in JMU Reddit Posts",
center=dict(lat=38.5, lon=-86), # 📝 TO DO: Center on region of interest
)
# STEP 3: Customize layout for sentiment analysis
fig_sentiment.update_layout(
map_style="carto-positron", # Clean background to highlight sentiment colors
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SENTIMENT ANALYSIS MAP
# =============================================================================
# Compare emotional patterns between your institution and JMU
# STEP 1: Calculate sentiment for your institution using identical methods
institution_sentiment = create_location_sentiment(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU (ensures fair comparison)
place_type_filter=None # Same filter as JMU (include all place types)
)
# STEP 2: Create matching sentiment visualization
# Use identical settings to JMU map for direct comparison
fig_institution_sentiment = px.scatter_map(
institution_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="avg_sentiment",
color_continuous_scale="RdYlGn", # Same color scale as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"avg_sentiment": ":.3f",
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=2, # 📝 TO DO: Adjust for your region
title="Average Sentiment by Location in UNC Reddit Posts", # 📝 TO DO: Update institution name
center=dict(lat=35.5, lon=-80), # 📝 TO DO: Center on your institution's region
)
# STEP 3: Apply identical layout for comparison
fig_institution_sentiment.update_layout(
map_style="carto-positron", # Same background as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
Sentiment Comparison Analysis¶
✍️ Write You analyzed the place-based sentiments in both corpora. Reflect on how this confirms or complicates your hypothesis.
Part 6: Time Series Animation Analysis¶
# =============================================================================
# ANIMATED TIME SERIES: SENTIMENT CHANGES OVER TIME
# =============================================================================
# Watch how places accumulate mentions and sentiment changes over time
# This reveals temporal patterns in student discussions
# STEP 1: Prepare animation data with rolling averages
# This function creates monthly frames showing cumulative growth and sentiment trends
institution_animation = create_time_animation_data(
df_institution_cleaned, # Your cleaned institution data
window_months=3, # 3-month rolling average (smooths out noise)
minimum_count=2, # Only places with 2+ total mentions
place_type_filter=None # Include all place types (📝 TO DO: experiment with filtering)
)
# STEP 2: Create animated scatter map
# Each frame represents one month, showing cumulative mentions and current sentiment
fig_animated = px.scatter_map(
institution_animation,
lat="revised_latitude",
lon="revised_longitude",
size="cumulative_count", # Dot size = total mentions up to this point in time
color="rolling_avg_sentiment", # Color = 3-month average sentiment (smoother than daily)
animation_frame="month", # Each frame = one month of data
animation_group="revised_place", # Keep same places connected across frames
hover_name="revised_place",
hover_data={
"cumulative_count": True, # Total mentions so far
"rolling_avg_sentiment": ":.3f", # Smoothed sentiment score
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f"
},
color_continuous_scale="RdYlGn", # Same sentiment colors as static maps
size_max=30, # Slightly larger max size for animation visibility
zoom=2, # 📝 TO DO: Adjust zoom for your region
title="Institution Reddit Posts: Cumulative Location Mentions & Rolling Average Sentiment Over Time",
center=dict(lat=35.5, lon=-80), # 📝 TO DO: Center on your institution's area
range_color=[-0.5, 0.5] # Fixed color range for consistent comparison across time
)
# STEP 3: Customize animation settings and layout
fig_animated.update_layout(
map_style="carto-positron",
width=800,
height=600,
title_font_size=16,
title_x=0.5,
coloraxis_colorbar=dict( # Customize the sentiment legend
title="Rolling Avg<br>Sentiment",
tickmode="linear",
tick0=-0.5, # Start legend at -0.5 (most negative)
dtick=0.25 # Tick marks every 0.25 points
)
)
# STEP 4: Set animation timing (in milliseconds)
# 📝 TO DO: Experiment with these values for optimal viewing
fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800 # Time between frames
fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300 # Transition smoothness
# Display with HTML export configuration
fig_animated.show(config={'displayModeBar': True, 'displaylogo': False})
✍️ Time Series Analysis¶
Write Explain how you optimized the visuals for your the time series visualization and how this confirms or complicates your hypothesis.
Our original hypothesis was that ODU’s Reddit focused mainly on off-campus issues, with a higher emphasis on negative statements, while JMU’s Reddit focused a majority of its attention on on-campus places and activities, with an emphasis on positive statements. Through our research, we concluded that our hypothesis was confirmed. We are able to argue this with confidence for many reasons, with our research backing it. One of which is the spatial comparison maps. These maps were able to demonstrate the differences between the two: JMU posts clustered around campus landmarks, while ODU’s posts were scattered more so around Norfolk neighborhoods and nearby roads. Another major supporting factor was the sentiment analysis. This was able to effectively demonstrate our predicted emotional differences. Positive scores were associated with on-campus terms like dorms and quad. At the same time, negative scores were associated with off-campus terms like parking and crime for ODU. I would argue that the biggest shortcoming of our analysis was our sample size. I understand that the majority of the time, a bigger sample size is better, but I think that only pulling posts from Reddit could be a major problem. I argue this because at this point, Reddit isn’t necessarily trending, and I would feel comfortable arguing that you’re likely not to get the general public's feelings toward the school from Reddit. I say this because Reddit is more so a niche media platform for college students than say Instagram or TikTok. Having said this, I think our research could have been improved if we had pulled posts from multiple platforms. Another limiting factor was the original geosparsing tool that was used. It consisted of many false positives along with inaccuracies, which forced us to manually clean the document. Not only did this require a lot more work than I’d hoped was necessary, but human error is also a very valid and real thing that has likely occurred throughout our research. I think that it could definitely be proven worth it to find a more reliable and task-specific geosparser that might even recognize region or age-specific slang. Given all of this, our findings have given me a good idea of the implications of spatial sentiment